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Abstract — It has been shown that approximate message passing 
algorithm is effective in reconstruction problems for compressed 
sensing. To evaluate dynamics of such an algorithm, the state 
evolution (SE) has been proposed. If an algorithm can cancel the 
correlation between the present messages and their past values, 
SE can accurately tract its dynamics via a simple one-dimensional 
map. In this paper, we focus on dynamics of algorithms which 
cannot cancel the correlation and evaluate it by the generating 
functional analysis (GFA), which allows us to study the dynamics 
by an exact way in the large system limit. 

I. Introduction 

Dynamics of iterative algorithms for compressed sensing is 
discussed in this paper. We consider a problem that an N— 
dimensional vector x e M. N is reconstructed from an M- 
demensional (M < N) vector y g M. M : 



y = Ax + u>, 



through an given M x N matrix A € R MxN . Here, u£l M 
denotes a noise vector u> ~ jV(0, of,/). Since the ratio S = 
M/N, which is called the compression rate, is less than one, 
the system of equations undetermined. The original vector Xq 
may be however reconstructed if we have some knowledge of 
it, namely the sparsity. This problem 0, 11241 . ifTTI is termed 
the reconstruction problem of compressed sensing llTZl . 10, 
0, 0. 

To solve such undetermined systems, linear programming 
(LP) methods is widely applied and is investigated its per- 
formance H2), 0, 0, 0. However, the LP might be still 
expensive to solve the large scale reconstruction problems. 
Recently, Donoho et al. have suggested an iterative algorithm 
which is called the approximate message passing algorithm 
(AMP) lfl3l . They have also proposed SE to evaluate its per- 
formance and have shown that the reconstruction performance 
of AMP is identical to that of the LP -based reconstruction 1 13 1. 
Bayati and Montanari have provided the rigorous foundation 
to SE and have shown that SE can be applied to a general 
class of algorithms on dense graph, namely algorithms which 
can cancel the correlation between the present messages and 
its past values [1|. This correlation is often called a retarded 
self-interaction, which is caused by iterations, or the Onsager 
reaction. 

Contrary to success of analysis for AMP, analysis for algo- 
rithms which cannot cancel the correlation between the present 
messages and their past values, e.g., the iterative shrinkage- 
thresholding algorithm (1ST) iTPJl . [27], is not discussed 
enough. In this case, we have to treat complex correlation. 
We focus on dynamics of algorithms which cannot cancel such 
correlation and evaluate the dynamics of 1ST by applying GFA 
ifTOl . lfT31 . 0, 0. Dynamics, that appears in the information, 
has drawn attention so far |B51. llltl . 11251 . Ifl8l . |fl9l . 



[21]. In GFA, we assume that the generating functional is 
concentrated around its average over the randomness in the 
large system limit, and we use the saddle-point methods to 
calculate the generating functional asymptotically. An advan- 
tage of GFA is to be able to evaluate dynamics of nonlinear 
systems exactly for the first few stages. To evaluate long time 
dynamics, approximation schemes may, on the other hand, 
have to be employed due to the computational cost. 

This paper is organized as follows. The next section intro- 
duces reconstruction algorithms. Section III and IV explains 
about analysis and experiments, respectively. The final section 
is devoted to a summary. 

II. Settings and Algorithms 
We assume the following to simplify the problem. Each 

N 



(1) element of the original vector Xq — (xo, n ) & Mr, is an i.i.d 



random variable which obeys the distribution p(x) = (1 
p)S(x) + p(27r) -1 / 2 exp(— x 2 /2) with a given signal density 
P (0 < p < 1), where 5(x) denotes Dirac's delta function. 
Each element of the compression matrix A — (a mn ) S j^mxn 
is an i.i.d. Gaussian random variable of mean zero and variance 
M-\ i.e., a mn -A^Af- 1 ). 

Donoho et. al. have developped the following iterative algo- 
rithm achieving the performance of LP-based reconstruction. 

Definition 1: Starting from an initial guess x^ = and 
z (o) — ^ approximate message passing (AMP) algorithm 
iteratively proceeds by 



x 



(2) 



%(A T *W + sW), 
zW = y-Ax® 

+ \z^(rj'(A T z ( - t ^+x^)}. (3) 
o 

Here, {i] t } is an appropriate sequence of threshold functions 
(applied componentwise), the current estimate 



of the original vector Xq, A denotes the transpose of A and 

r]' t (u) — dr] t (u)/du. For a vector v = (vi, ■ ■ ■ ,vn), (v) = 

One of other popular iterative algorithms 11261 has the follow- 
ing form. 

Definition 2: Starting from an initial guess x^ = 0, 
the iterative shrinkage-thresholding algorithm (1ST) iteratively 
proceeds by 



x 



(t+i) = 
At) 



v-Ax®. 



(4) 
(5) 

□ 



Here, the parameter c > 1, which also appears in the 
separable surrogate functionals (SSF) method [9|, is intro- 
duced to make 1ST be easy to converge. When c = 1, the 
difference between AMP and 1ST is only whether the term 
^z^- 1) (r/' t _ 1 (A T z ( - t -^ + x^" 1 ))) exists or not. The 1ST 
lacks this term which can cancel the correlation between 
the present messages and their past values. Due to this, the 
summation of massages cannot be regarded as a Gaussian 
random variable. It cannot therefore hope that the shrinkage- 
thresholding works properly. While the property of AMP 
is investigated theoretically and thoroughly lfl3l . [T], The 
dynamics of such algorithms for the reconstruction problem 
is not discussed enough so far. 

III. Analysis 

The goal of our analysis is to evaluate the mean squared 
error (MSE) per component. 

We analyze the dynamics in the large system limit where 
N,M — » oo, while the compression rate S is kept finite. 
The dynamics is a Markov chain, so the path probability 
p[x(°\ ■ ■ ■ , x^'], which is often called path probability, are 
simply given by products of the individual transition proba- 
bilities of the chain: 



Plx^,--- ,x«] =5{x^]l[S[x^ +1 



s=0 



(±A T (y- Ax«) + x« +0W 



(6) 

which is called the path probability. Here, Sw is an exter- 
nal message which is introduced to evaluate the response 
function and these parameters {O ,--- ,6^} are set to be 
zero in the end of analysis. The initial state probability 
becomes p[x(°'] = Hn=i S[xn ]• Therefore, we can cal- 
culate an expectation with respect to an arbitrary function 
Q = Q(x(°\--- ,xW) of tentative decisions as = 
Jr(*+i)jv (Il!=o dx^) p[x^\--- ,x^]g, where x denotes a 
set {a^ ),-" and Ex denotes the expectation with 

respect to a random variable X. To analyze the dynamics of 
the system we define the following functional that is called 
the generating functional. 

Definition 3: The generating functional Z[ip] is defined by 



typical behavior of the system depends only on the 
statistical properties of the random variables. We 
therefore evaluate the averaged generating functional 

Z\i>] = E xA ^UeM-^EUo x(s) ■ ^ (s) ])< where M 

denotes an expectation over {A, Xo,u>}. Evaluating the 
averaged generating functional, one can obtain important 
parameters which describe the algorithm performance. 
Namely, we can evaluate the overlap, which is also called 
the direction cosine, between he original vector Xo and the 
current estimate x^ and the second moment of the current 



estimate. Since 



|*o-* (t) ||2 = |l*o| 



:-2xW- 



x 



X 



Will, 



we can evaluate MSE from the overlap and the second 
moment. Here, x^' ■ Xq denotes the inner product between 
x« and Xo. One finds the following proposition. 

Proposition 1: For 1ST with an arbitrary sequence of 
threshold functions {rj s Y s =07 MSE per component of of the 
current estimate x^ can be assessed as 



of=JV "-EjB^^^dlsBo - x 
= p-2m<'> +<?<*•'>, 



(*)||2^ 



(ID 



exp 



in the large system limit, i.e., N —> oo, where the parameters 
are given as follows. 

= ((x x^)), (12) 
(?(»,«') = iiW/)^ (B) 
G (s,s') = ^ x (s) ^ R -^( s ')))i( s > s ') 5 (14) 

where I(V) denotes an indicator function which takes 1 if the 
proposition V is true, otherwise. Here, the average over the 
effective path measure ((• • ■)) is given by 

((g(x,v))) ±E Xa Vv J(f[dx^g(x,v) S[x^] 

t-l s 

x J| ( 5[a;( s + 1 ) - r] s (xok^ + «<») + (Tx)^)] I , 

s=0 ' 

(15) 

where Vv = |2tt.R|- 1/2 dv exp[-|w • R~ x v], R 



iVasM.^M ), (7) c- 2 {l + (cSy 1 G T )- 1 D(l + (cS)- 1 G)- 1 ,T = c- 1 (c-l)l 

s =o I' +c- 1 {l + (cd)- 1 G)- 1 (c5)- 1 G and kM> = cr^A^. Each 



where ^ = (t/tf\, ■ ■ ■ ,i$) T - □ 
In familiar way [10|, [6|, [18|, one can obtain all averages of 
interest by differentiation, e.g., 



i lim 



dZ[ij>] 
dZ[xp] 

dZty] 



lim 



i lim 



86 



(«') 



(8) 
(9) 

(10) 



from Z[ip]. We assume that the generating functional 
is concentrated to its average over the random variables 
{A, Xq,uj} in the large system limit, namely the 



c _1 (l + {c6)- 1 G)- 1 (c8)- 1 G and = cr^A^. Each 
entry of D is L>( s - S ') = a'l + S- 1 [p - - m( s '> + C( s - S ')] 
and each entry of A[ s ] is A{ s ' s ' = S s , s > + (1 — 6 StS >)(6 s > t8 » + 
(cSy-G^'" >"')). The terms (R^vY^ and (rer)( s ) denote the 
s th element of the vector R ~ l v and Tcr, respectively. □ 
Outline of derivation is available in Appendix A. The pa- 
rameters m( s ' and C^ s,s ) are referred to as the overlap and the 
correlation function, respectively. Especially, C^ s ' s ' gives the 
second moment of the s th estimate. In GFA, we extract a one- 
dimensional iterative process which is statistically equivalent 
to the original A^-dimensional iterative process. The effective 
path measure ((•••)) is an expectation operator with respect 
to such a one-dimensional process. Proposition Q] entirely 
describe the dynamics of the system. The term (IV) ( s ) in ( fl~5b 
is called the retarded self-interaction or the Onsager reaction 
term. 



IV. Experiments 

To validate the results obtained above, we performed nu- 
merical experiments in N = 2, 000 systems. For sparse signed 
original vectors, the sequence of the threshold functions |fl3l 
is chosen as r] t (x; X t ) — (x~X t )I(x > Xt) + (x+X t )I{x < X t ) 
with At = Xut /c, where A is a threshold control parameter and 
at is MSE per component of the current estimate. In practice, 
we cannot use a true MSE, since we do not know the original 
vector. We therefore have to use an alternate value instead 
of the true MSE. The MSE on zeros of = E xo [I(a;o,„ = 



0)(4) 2 |* 



0], which is referred to as MSEZ, is one 



of useful alternate values which are easy to estimate. We set 
iTq = p for an initial value. 

Figure Q] shows the first few stages of the dynam- 
ics of 1ST which is predicted by GFA. The parameters 
are set to be (p, 6, A, c) G {(0.1,0.5,3,3), (0.1,0.8,3,1), 
(0.1,0.8,0.5,1)}. The parameter A is not optimized for 1ST. 
Figure [Ha) is a case where the reconstruction is successful. 
The parameter c is set to be c > 1 like the SSF method in 
this case. Since the residue is added little by little, it is easy 
to avoid a vibration behaviour. Figure [TJb) is a case where the 
reconstruction fails. When the parameters are near the region 
where the reconstruction succeeds, a vibration behaviour often 
turns up. Figure [TJc) is also a case where the reconstruction 
fails. When the parameters are far from the region where the 
reconstruction succeeds, MSE generally diverges. The GFA 
prediction is in good agreement with computer simulation 
result. The parameter c is set to one in Fig. [TJb) and Fig. 
QIc), which corresponds to the simple iterative thresholding 
algorithm (ITA) JJJ). 

The self-consistent equations appeared in Proposition 
proposition:IST involve three kinds of parameters {m^ s \ 
C( s ' s \ G^ s ' s '}* s / =0 - The number of these parameters is 
t + 2t 2 and gradually grows as time passes. The other 
parameters can be easily calculated from these. When one 
solve self-consistent equations according to its definition, the 
computational cost for stage t becomes 0(i 2 e*) since each 
parameter, e.g., involves a ^-multiple integral, Approxi- 

mation schemes to evaluate the GFA result might be therefore 
important to capture long time dynamics [14], [15|. 

V. Summary 

We analyzed dynamics of the iterative shrinkage- 
thresholding algorithm for compressed sensing as a typical 
algorithm which cannot cancel the correlation between the 
present messages and their past values. While the state evolu- 
tion plays an important role to understand nature of iterative 
algorithms which can cancel such a correlation exactly, the 
generating functional formalism gives us a analytical method 
to treat iterative algorithms which cannot cancel the corre- 
lation. The result of the generating functional formalism for 
algorithms which can cancel the correlation must give that of 
SE. It is under way to check this property. 

Acknowledgment 

The author would like to thank Andrea Montanari for his 
valuable comments. This work was partially supported by 
a Grant-in- Aid for Scientific Research (C) No. 22500136 
from the Ministry of Education, Culture, Sports, Science and 
Technology (MEXT) of Japan. 



0.12 

0.1 of 
0.08 
0.06 
0.04 
0.02 






theory — E 






simulation 


>■■ V 


\ 

\ MSE ^ 






\ MSEZ 




J ; S 



(a) 




(b) 




(c) 



Fig. 1. The first few stages of the dynamics of 1ST predicted by 
GFA (squares: MSE, trianlges: MSEZ). Computer simulations (circles: MSE, 
Inverted triangles: MSEZ) are evaluated in N = 2, 000 systems, (a) Case 
where the reconstruction succeeds, (p, S, A,c) = (0.1,0.5,3,3). (b) Failure 
case, (p, 8, A, c) = (0.1,0.8,3,1). (c) Another failure case, (p, 5, A, c) = 
(0.1,0.8,0.5, 1). 



Appendix 



A. Outline of analysis 

.(*) 



Let = (uli) be a summation of messages, i.e., 

„(*) A I^T z (t) + x (t) + (t) j where 0(f) is an exter . 

nal message which is introduced to evaluate the response 
function G^ s ' s '. The Dirac's delta function is replaced as 
5(x) = 7(27r) _1 / 2 e~ 7 x I" 1 and the parameter 7 is taken 
the limit 7 — > 00 later. We first separate the summation 
of messages at any iteration step by inserting the following 
delta-distributions: 1 = J SuSu nl=o Iln=i ex P {«n' — 



(iA T z«) n 

s u = n!=o 



n 



W-eP}], where 8u ^ ifeo II 

f dui B) 



N du^ 
1 



and 



/2tt ' 



Here, (a)„ denotes the n th element 



of the vector a. We then have 

Z[ip] =Ex ,A,w( V p[x i0) ] f SuSue-^o 

l x(0),...^< t ) J *' tN 

r t-1 JV 



cc <s '-i/> (s) 



s=0 n=l 



t AT 



x exp 



M t— 1 / JV 



m— 1 s— n— 1 

4ee(e— (s 

m— 1 5*— ^n— 1 
n=l '-IJ 



(16) 



In order to average the generating functional with respect to 
the disorder A and u>, we isolate the spreading codes by 
introducing the variables ,10™: 1 = J SvSv n*=o IIm=i 
exp[iv${v$ -VSJ2n=i 8 m»«l S> }] and 1 = / Swdw J]*=o 
Ilm=i exp[iw^ ) {w^ ) -V$J2n=i a mn{xo,n-%n ]}], where 

r A T-rM T-ft— 1 dl)^ 1 c~ A ttM TT*-1 d*™' r A 

5w = llm=in s =0^' Sv = llm=in s =0 ^ = 

m=i n s=0 -7^' and = n m =i n s =o -Tip ° ne then 

obtains 



< exp 



M t-1 / JV 



4 EEIE^ 

m— 1 s— Vi=l 

4 E E (E a ™""« } ) (E ^"{^o.n - 4 s) i) ) 



SvdvSwSw 



R 4tN 



A/ t-1 



L m— 

xE u exp 



xEa exp 



' 1 it — t_1 

J7l=l S= " 

M £-1 
m— 1 s— n— 1 



JV 



>(•) 



JV 



(17) 



We now can calculate the average of the term in the disorder- 
averaged generating functional. The term E u) {- • • } in (fTTI i 
becomes 



exp 



-^viEE 



M t-1 



exp 



m— 1 s—Q 
M t-1 t-1 



c\l 5 



m— 1 s— s'= 



(18) 



Calculating the average of the term containing the disor- 
der in Z[i/>] , we separate the relevant one-stage and two- 
stage order parameters by inserting: 1 = (^-)* / dmdm 

exp[iiVE:=o * (sl {m( s ) Eti V^}], 1 = (f )* 

Jdfedfc exp[iiVE^o fc (s) {fc (s) -F Eli ^o>i s) }], 



JV 



Eti sW'}], 1 = (^)* 2 JdQdQ exp[iJVE^Ey=o 
Q(^') {Q (^') _^ E ^ =i fiCf)fiCf')}] and 1 = {^f 

JdLdL exptiiVE^oEyio L(^'){L(^') -i E^Li 

■*}]• Since the initial state probability is factorizable, 
the disorder-averaged generating functional factorizes into 
single-site contributions. 

The disorder-averaged generating functional is for N — > oo 
dominated by a saddle -point [8|, [17|. We can thus simplify 
the saddle-point problem to (fT9l l. The disorder-averaged gener- 
ating functional is then simplified to the saddle-point problem 

as 



Z[ip] =E X0 / Amdm&kdkAqdq&QdQdLdL 



x exp 



jV(* + $ + ft) + (9(lnjV) 



(19) 



in which the functions "J, il are given by 



* ^]T{m( s W s ) + k^k^} + ^{g (s ' s V s ' s,) 

s=0 s=0s'=0 

+ Q( S ' S ')Q( S ^') + L^L^'"">} (20) 



n=l k K v s=0 



x exp 



L s=0 

t-i t-i 



s ')t( s ) t ( ;5 ') 



s=0 s'=0 

+ g( s ' s ')M( s )u( s ') + i(^ s ')a; (s) it( s ')} 
t-i 

+ iJ2u {s] W s) - - 9^ - x , n k {s) } 



s=0 
t-1 



i > in.ni' 'm 

s=0 



:E^ W ^ 



s=0 



(21) 



x exp 



SvSvSwdw 

M t-1 



L m— 1 s— 
M t-1 t-1 



I E E E{i^«' } 



2 A^, A^^S 

m—1 s—0 s'—O 
1 M t-1 t-1 

^EEE{^[ fc(s) -^' 

m— 1 s—0 s'—O 

+ t&W[fc(O_L(-V)]0(O} 

. Af t-1 t-1 

^EEE 

m— 1 s—0 s'—O 



{^)[ / ,- m w- m («') +3 ( S y) ] ^') } 



(22) 



where 5m = f] 



t-i d« (s) 



and = ]~| 



l s=0 \/2^_^ lls=0 y27T 

TV — > oo, the integral U9i will be dominated by the saddle 
point of the extensive exponent 'J? + $ + ft. 

One can deduce the meaning of order parameter by deriva- 
tion of the averaged generating functional Z[ip] with respect to 
the external messages {9^} and the dummy functions {ipu^}. 
The averaged generating functional Z[tp] is dominated by a 
saddle-point for N — >• oo. W e can mus simplify ( fT9] l in the 
large system limit. Using Z[0] = 1, From derivatives of the 
averaged generating functional, we find 



(23) 



+ (l-<5n,„0(^ (S) )nIEc C „(x (s ' ) )^ (24) 



j(#n ^ ) 



(25) 



where (•••)« denotes the average as (f(x,u,u)) n = 
[ExCo),..., a w /5n5tt fi n (x,u,u) f(x,u,u)]/ [E a c),-,xW 
/ Hn{x, u, u)} with [i n (x,u,u) = <5[a;( )] exp[ E«=o 
{ln^= -^+D - r, s (^>)] 2 } -iE^ E'7i {g<»- 7 '> 
+g( s ^') u< s ') +L( S ^') a;( s ) w( s ')} +iE!=o 
fiW { U W _ X W _0« -a;o,n* W } -iE^=o ^o,^ (s) m (s) ] 
I saddle- Here, /| saddle denotes an evaluation of a function / 
at the dominating saddle-point. The saddle-point equations are 
derived by differentiation of N(ty + $ + ft) with respect to 
integration variables {m,rh, k,k, q,q, Q,Q, L and L}. 
These equations will involve the average overlap rrv- s \ the 
average single-user correlation C^ s ' s ' and the average single- 
user response function G^ s ' s 



Using the derivatives (1231 - (l25l >. straightforward differen- 
tiation of ^ + $ + ft with respect to m (s) , rh (s) , fc (s) , k( s \ 
q (s,s') t q(s,s')^ Q(B,s') t Q(8,s'), L( s - S ') and L^ s ' s "> leads us to 
the following saddle-point equations: mW = i g 8 ^ | saddle, 
fcW = 

£;( s ) = 0~g( s > s ') =i 



IgjTJijI saddle. m y ' 



lim 



an 



TJ | saddles 9 

an 



(*,»') = H m 



r( s )) 



q(»,<0 = o, L^*') = 



an 



L QQ(s,s') Isaddle 

* eL (s,7J Isaddie and L (s ' s,) = limjv->oo ^ En=i (^ (s) " (s,) ) 
for all s and s'. We then find m( s ) = m^, q( s - s ') = C^ 8 
and L( s > s ') = iG^ s ' s 



It should be noted that the causality 



0, should be hold for s < s , therefore 



d(xM)/d9W 

L (s,s') = q(s,s') = o for s < s-. 

Straightforward differentiation and taking the limit 7 — > 00, 
we then arrive at Proposition [T] 
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